MolRec at CLEF 2012 - Overview and Analysis of Results
نویسندگان
چکیده
We present the results and analysis of our chemical structure recognition system, MolRec, in the CLEF 2012 chemical structure recognition task. MolRec analyses a diagram image, extracts vectorised components from the image and applies a rule based system to construct an internal representation of the chemical structure. This internal representation can then be exported to MOL or SMILE format. The task assigned in CLEF was to analyse two sets of chemical diagram images clipped from patent documents. The first set is of 965 diagram images, the results of which could be evaluated automatically using OpenBabel. The second set is a more challenging collection of 95 images which include elements not supported by OpenBabel and which therefore have to be evaluated manually. On the first set, MolRec achieved recognition rates of between 94.91% and 96.18% over 4 runs with slightly different parameters. On the more exacting second set, MolRec’s recognition rate was between 46.32% and 58.95%. Overall the results testified to high performance on a large sample of quite complex diagrams but also to the challenges posed by the more difficult images that appear in real patent documents.
منابع مشابه
Performance of MolRec at TREC 2011 Overview and Analysis of Results
Chemical molecular diagrams are commonly found in documents from the chemical and life science disciplines. We present an overview of the elements of these diagrams and of MolRec, our system for analysing and recognising them. MolRec uses a number of techniques to refine the scanned images and precisely detect line segments and line junctions, structural elements and the atomic formulae that co...
متن کاملAn Overview of the Traditional Authorship Attribution Subtask
This paper describes the Traditional Authorship Attribution subtask of the PAN/CLEF 2012 workshop. As a followup to our subtask at PAN/CLEF 2011 (Amsterdam), we established a new corpus for analysis for 2012 (Rome). The new corpus differed in several ways from the previous subtask: – Both the number and size of documents were decreased – The documents were taken from a different genre (fiction,...
متن کاملCLEF-IP 2012: Retrieval Experiments in the Intellectual Property Domain
The Clef-Ip test collection was rst made available in 2009 to support research in IR methods in the intellectual property domain. Since then several kinds of tasks, re ecting various speci c parts of patent expert's work ows, have been organized. We give here an overview of the tasks, topics, assessments and evaluations of the Clef-Ip 2012 lab.
متن کاملCLEF 2004: Ad Hoc Track Overview and Results Analysis
We describe the objectives and organization of the CLEF 2004 ad hoc track and discuss the main characteristics of the experiments. The results are analyzed and commented and their statistical significance is investigated. The paper concludes with some observations on the impact of the CLEF campaign on the state-of-the-art in cross-language information retrieval.
متن کاملPatent Terminlogy Analysis: Passage Retrieval Experiments for the Intellecutal Property Track at CLEF
In 2012, the University of Hildesheim participated in the CLEF-IP claims-to-passage task. 4 runs were submitted and different approaches tested. The tested approaches included a language independent trigram search approach, one approach formulating a query in the source language only and another approach with querys translated to English, German, French and Spanish. The results were not satisfa...
متن کامل